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Abstract 


The leading approach for computing strong game-theoretic 
strategies in large imperfect-information games is to first 
solve an abstracted version of the game offline, then per- 
form a table lookup during game play. We consider a mod- 
ification to this approach where we solve the portion of the 
game that we have actually reached in real time to a greater 
degree of accuracy than in the initial computation. We call 
this approach endgame solving. Theoretically, we show that 
endgame solving can produce highly exploitable strategies in 
some games; however, we show that it can guarantee a low 
exploitability in certain games where the opponent is given 
sufficient exploitative power within the endgame. Further- 
more, despite the lack of a general worst-case guarantee, we 
describe many benefits of endgame solving. We present an 
efficient algorithm for performing endgame solving in large 
imperfect-information games, and present a new variance- 
reduction technique for evaluating the performance of an 
agent that uses endgame solving. Experiments on no-limit 
Texas Hold’em show that our algorithm leads to significantly 
stronger performance against the strongest agents from the 
2013 AAAI Annual Computer Poker Competition. 


1 Introduction 


Sequential games of perfect information can be solved in 
linear time by a straightforward backward induction pro- 
cedure in which solutions to endgames are propagated up 
the game tree.! However, this procedure does not work in 
general in imperfect-information games because different 
endgames can contain nodes that belong to the same infor- 
mation set and cannot be treated independently. More so- 
phisticated algorithms are needed for this class of games. 
One algorithm for solving two-player zero-sum imperfect- 
information games is based on a linear program (LP) for- 
mulation (Koller, Megiddo, and von Stengel 1994), which 
scales to games with around 108 nodes in their game 
tree (Gilpin and Sandholm 2006). Many interesting games 
are significantly larger; for example, two-player limit Texas 
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‘Prior work has demonstrated that precomputing solutions to 
endgames offline can be effective in large perfect-information 
games (Bellman 1965; Schaeffer et al. 2003). In contrast, we solve 
endgames online. 


Hold’em has about 10!” nodes, and a popular variant of two- 
player no-limit Texas Hold’em has about 10165 nodes (Jo- 
hanson 2013). To address such large games, newer approx- 
imate equilibrium-finding algorithms have been developed 
that scale to games with around 1014 nodes, such as coun- 
terfactual regret minimization (CFR) (Zinkevich et al. 2007) 
and an algorithm based on the excessive gap technique 
(EGT) (Hoda et al. 2010). These algorithms are iterative and 
guarantee convergence to equilibrium in the limit. 

The leading approach for solving extremely large games 
such as Texas Hold’em (TH)? is to abstract the game down 
to a game with only around 10! nodes, then to compute 
an approximate equilibrium in the abstract game using one 
of the algorithms described above (Billings et al. 2003; 
Gilpin and Sandholm 2006). In order to perform such a 
dramatic reduction in size, significant abstraction is often 
needed. Information (aka card) abstraction involves reduc- 
ing the number of nodes by bundling signals (e.g., forcing 
a player to play the same way with two different hands), 
and action (aka betting) abstraction involves reducing the 
number of actions by discretizing large action spaces into a 
small number of actions. All of the computation (both for 
constructing the abstraction and computing an approximate 
equilibrium in the abstraction) is done offline, and a table 
lookup is performed in real time to implement the strategy. 

We consider a modification to this approach where we re- 
tain the abstract equilibrium strategies for the initial portion 
of the game tree (called the trunk), and discard the strate- 
gies for the final portion (called the endgames). Then, in real 
time, we solve the relevant endgame that we have reached 
to a greater degree of accuracy than the initial abstract strat- 
egy, where we use Bayes’ rule to compute the distribution of 
players’ private information leading into the endgames from 
the precomputed trunk strategies. This approach, which we 
call endgame solving, is depicted in Figure 1. 

We present the first theoretical analysis of endgame solv- 
ing in imperfect-information games, and show that it can ac- 
tually produce highly exploitable strategies in some games. 
In fact, we show that it can fail even in a simple game with 
a unique equilibrium and a single endgame, even if our base 
strategy were an exact equilibrium (of the full game) and we 
were able to compute an exact equilibrium in the endgame. 


>See Appendix A for background on Texas Hold’em poker. 
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Figure 1: Endgame solving (re-)solves the relevant endgame 
that we have actually reached in real time to a greater degree 
of accuracy than in the offline computation. 


However, we show that endgame solving can guarantee a 
low exploitability (difference between game value and pay- 
off against a nemesis) in some games when the opponent is 
given sufficient exploitative power within the endgame. 

Endgame solving has been used by several prior agents 
for the limit variation of TH (where bets must be of a sin- 
gle fixed size). The agent GS1 precomputed strategies only 
for the first two rounds, using rough approximations for 
the payoffs at the leaves of that trunk based on the (un- 
realistic) assumption that there was no betting in future 
rounds (Gilpin and Sandholm 2006). Then in real time, the 
relevant endgame consisting of the final two rounds was 
solved using the LP algorithm. GS2 precomputed strate- 
gies for the first three rounds, using simulations to estimate 
the payoffs at the leaves of that trunk; it then solved the 
endgames for the final two rounds in real time (Gilpin and 
Sandholm 2007). 

However endgame solving has not been implemented by 
any competitive agents for the significantly larger and more 
challenging domain of no-limit Texas Hold’em (NLTH) 
prior to our work. We present a new algorithm that is ca- 
pable of scaling to extremely large games such as no-limit 
Texas Hold’em, and incorporates several algorithmic im- 
provements over the prior approaches (the benefits described 
in this paragraph would be improvements over the prior 
approaches even for the limit variant). First, the prior ap- 
proaches assume that the private hand distributions lead- 
ing into the endgame are independent, while they are ac- 
tually dependent and the full joint distribution should be 
computed. The naive way of accomplishing this would re- 
quire O(n?) strategy table lookups, where n is the number 
of private hands (1081 for the final round of poker), and 
computing these distributions would become the bottleneck 
of the algorithm and make the real-time computation in- 
tractable; however, we developed a technique for computing 
the joint distributions that requires just O(n) strategy table 
lookups. Second, the prior approaches use a single perfect- 
recall card abstraction that has been precomputed offline 
(which assumes a uniform random distribution for the oppo- 
nent’s hand distributions). In contrast, we use an imperfect- 
recall card abstraction? that is computed in real time in a 
finer granularity than the initial offline abstraction and that is 
tailored specifically to the relevant distribution of the oppo- 
nent’s hands at the given hand history. Furthermore, the prior 
approaches did not compare performance between endgame 


3Imperfect-recall abstractions allow for greater flexibility in 
which hands can be grouped together, and have been shown 
to significantly improve performance over perfect-recall abstrac- 
tions (Waugh et al. 2009; Johanson et al. 2013). 


solving and not using it (since the base strategies were not 
computed for the endgames), while we provide such a com- 
parison. 

Very recent work, which appeared subsequently to the 
first version of this work, has presented approaches for de- 
composing imperfect-information games into smaller games 
that can be solved independently offline, and provides some 
theoretical guarantees on full-game exploitability. One of 
these approaches has only been applied to the small do- 
main of limit Leduc Hold’em, which has 936 information 
sets in its game tree, and is not practical for larger games 
such as NLTH due to its running time (Burch, Johanson, 
and Bowling 2014). A second related (offline) approach in- 
cludes counterfactual values for game states that could have 
been reached off the path to the endgames (Jackson 2014). 
This approach has been demonstrated to be effective in limit 
Leduc Hold’em, and has also been implemented in NLTH, 
though no experimental results are given for that domain. 
For NLTH, it is implemented by first solving the game in 
a coarse abstraction, then fixing the strategies for the pre- 
flop (first) round, and re-solving for certain endgames start- 
ing at the flop (second round) after common preflop bet- 
ting sequences have been played. All of this computation 
is done offline. In contrast, our approach enables us to solve 
endgames at the river (final round) in real time. It is infeasi- 
ble to solve the river endgames using the prior approach for 
several reasons. First, there are far too many of them to be 
solved individually in advance (there is a different one for 
each sequence of public cards and betting actions). Second, 
by the time play gets down to the river, there are many possi- 
ble alternative actions that a player could have taken to avoid 
reaching the given endgame, and counterfactual values for 
each of these would need to be computed and then included 
in the solution to the endgame solver; this would likely be in- 
feasible to do in real time. Solving the river endgames, as op- 
posed to the flop endgames which the prior approach does, 
is very important because CFR only occasionally samples 
from a specific river endgame during the course of the initial 
equilibrium computation, while it very frequently samples 
from the flop endgames that follow common preflop betting 
sequences. So, our approach is addressing a more pressing 
limitation. 

Our approach has significant benefits over the standard 
approach for solving large imperfect-information games, 
including computation of exact (rather than approximate) 
equilibrium strategies (within a given abstraction), the abil- 
ity to compute certain equilibrium refinements that cannot 
be computed in the full offline computation, finer-grained 
abstraction in the endgames, abstraction that takes into ac- 
count realistic distributions of players’ private information 
entering the endgame (as opposed to the typical assumption 
of uniform random distributions), and a solution to the “off- 
tree” problem that arises when the opponent has taken ac- 
tions that are not allowed in the abstraction. We present an 
efficient algorithm for performing endgame solving in large 
imperfect-information games, and present a novel variance- 
reduction technique for evaluating the performance of an 
agent that uses endgame solving. Experiments on no-limit 
Texas Hold’em show that using our algorithm leads to a sig- 


nificantly stronger performance against the strongest 2013 
poker competition agents. 


2 Endgame Solving 


Definition 1. E is an endgame of game G if the following 
two properties hold: 


1. If s is a child of s in G and s is anode in E, then s' is 
also a node in E. 


2. If s is in the same information set as s' in G and s is a 
node in E, then s' is also anode in E. 


For example, we can consider endgames in poker where 
several rounds of betting have taken place and several pub- 
lic cards have already been dealt. In these endgames, we can 
assume players have a joint distribution of private informa- 
tion from nodes prior to the endgame that are induced from 
the precomputed base approximate-equilibrium strategy us- 
ing Bayes’ rule. Given this distribution as input, we can then 
solve individual endgames in real time using more accurate 
abstractions. 

Unfortunately, this approach has some fundamental theo- 
retical shortcomings. It turns out that even if we computed 
an exact equilibrium in the trunk (which is an unrealistically 
optimistic assumption in large games) and in the endgame, 
the combined strategies for the trunk and endgame may fail 
to be an equilibrium in the full game. One obvious reason 
for this is that the game may contain many equilibria, and 
we might choose one for the trunk that does not match up 
correctly with the one for the endgame; or we may compute 
different equilibria in different endgames that do not balance 
appropriately. However, Proposition 1 shows that it is pos- 
sible for this procedure to output a non-equilibrium strategy 
profile in the full game even if the full game has a unique 
equilibrium and a single endgame. 


Proposition 1. There exist games—even with a unique equi- 
librium and a single endgame—for which endgame solving 
can produce a non-equilibrium strategy profile. 


Proof. Consider a sequential version of Rock-Paper-Scis- 
sors where player 1 acts, then player 2 acts without observ- 
ing player 1’s action. This game has a single endgame— 
when it is player 2’s turn to act—and a unique equilibrium— 
where each player plays each action with probability 3 Now 
suppose we restrict player 1 to follow the equilibrium in 
the initial portion of the game. Any strategy for player 2 
is an equilibrium in the endgame, because each one yields 
her expected payoff 0. In particular, suppose our equilibrium 
solver outputs the pure strategy Rock for her. This is clearly 
not an equilibrium of the full game. 


Rock-Paper-Scissors (RPS) is somewhat of an extreme 
example though, because player | does not actually make 
any moves in the endgame. At the other extreme, if the 
endgame were the entire game, then endgame solving would 
produce an exact equilibrium. As a slightly less extreme 
example, consider the game in Figure 2, where P1 selects 
an action a;, and then a sequential imperfect-information 
game G; is played. Suppose we are solving endgames af- 
ter P1’s initial action. Then we will solve the endgame G; 


and produce strategies with zero exploitability in the full 
game. Endgame solving could be very useful in this game 
for several reasons. First, if the number of initial actions n 
for P1 were extremely large, it may be infeasible to solve 
and/or store solutions to all of the endgames in advance of 
game play. Endgame solving would only require solving the 
endgames that are actually reached during game play, and 
would be feasible even if n is extremely large as long as the 
number of game repetitions were relatively small. And sec- 
ond, the typical approach would actually not even involve 
solving each of the G; separately advance; it would be to 
solve the full game, which includes each of the G; as well 
as P1’s initial actions. It is very possible that equilibrium- 
finding algorithms would not scale to the full game and/or it 
would not fit in memory, while equilibria could be computed 
quickly and fit into memory for the individual endgames G;. 


n 


Figure 2: Player 1 selects his action a;, then the players play 
imperfect-information game G;. 


One could imagine much more complex trunk games than 
the above example with imperfect information and multi- 
ple actions for both players where it is difficult to know 
for sure how “important” the trunk strategies are for the 
endgames. In such games, it may be possible for endgame 
solving to still guarantee a reasonably low exploitability in 
the full game. As Proposition 2 shows, in general, the more 
exploitative power the opponent has within the endgame, the 
lower the full-game exploitability of the strategies produced 
by (approximate) endgame solving are. 


Proposition 2. If every strategy that has exploitability 
strictly more than e€ in the full game has exploitability of 
strictly more than 6 within the endgame, then the strategy 
output by a solver that computes a 6-equilibrium in the 
endgame induced by a trunk strategy t would constitute an 
e-equilibrium of the full game when paired with t. 


Proof. Suppose a strategy is a -equilibrium in the endgame 
induced by ¢, but not an e-equilibrium in the full game when 
paired with t. Then by assumption, it has exploitability of 
strictly more than ô within the endgame, which leads to a 
contradiction. 


Intuitively, Proposition 2 says that endgame solving pro- 
duces strategies with low exploitability in games where the 
endgame is a significant strategic portion of the full game, 
that is, in games where any endgame strategy with high full- 
game exploitability can be exploited by the opponent by 
modifying his strategy just within the endgame. 

One could classify different games according to how they 
fall regarding the premise of Proposition 2, given a subdi- 
vision of the game into a trunk and endgames, and given 
fixed strategies for the trunk. If the premise is satisfied, then 


we can say that the game/subdivision satisfies the (e€, 6)- 
endgame property. An interesting property would be the 
smallest value e*(6) such that the game satisfies the (e, 6)- 
endgame property for a given 6. For instance, the game in 
Figure 2 would have «*(6) = 6 for all ô > 0, while RPS 
would only have «*(0) = 1 for each 6 > 0. While Propo- 
sition 2 is admittedly somewhat trivial, such a classification 
could be useful in developing a better understanding of when 
endgame solving would be helpful in general. 


3 Benefits of Endgame Solving 


Even though we showed in the previous section that end- 
game solving may lead to highly exploitable strategies in 
some games, it has many clear benefits in large imperfect- 
information games, which we now describe. These bene- 
fits and techniques are enabled by using endgame solving 
(rather than being techniques that help alongside endgame 
solving). 


3.1 Exact Computation of Nash Equilibrium in 
Abstracted Endgames 


The best algorithms for computing approximate equilibria 
in large games of imperfect information scale to games with 
about 1014 nodes. However, they are iterative and guarantee 
convergence only in the limit; in practice they only produce 
approximations of equilibrium strategies (within a given ab- 
straction). Sometimes the approximation error can be quite 
large. For example, one recent NLTH agent reported having 
an exploitability of 800 milli big blinds per hand (mbb/h) 
even within the abstract game (Ganzfried and Sandholm 
2012). This is extremely large, since an agent that folds ev- 
ery hand would only have an exploitability of 750 mbb/h. 
The best general-purpose LP algorithms find an exact equi- 
librium, though they only scale to games with around 10° 
nodes (Gilpin and Sandholm 2006). While the LP algorithms 
do not scale to reasonable abstractions of full TH, we can 
use them to exactly solve abstracted endgames that have up 
to around 10° nodes. We do exact endgame solving in the 
experiments. 


3.2 Ability to Compute Certain Equilibrium 
Refinements 


The Nash equilibrium (NE) solution concept has some theo- 
retical limitations, and several equilibrium refinements have 
been proposed which rule out NEs that are not rational in 
various senses. In general, these solution concepts guaran- 
tee that we behave sensibly against an opponent who does 
not follow his prescribed equilibrium strategy (i.e., he takes 
actions that should be taken with probability zero in equi- 
librium). Specialized algorithms have been developed for 
computing many of these concepts (Miltersen and Sgrensen 
2006; 2008; 2010). However, those algorithms do not scale 
to large games. In TH, computing a reasonable approxima- 
tion of a single Nash equilibrium already takes months (us- 
ing the leading algorithms, CFR or EGT), and there are no 
known algorithms for computing any of the common re- 
finements that scale to games of that size. However, when 
solving endgames that are significantly smaller than the full 


game, it can be possible to compute certain refinements. 
An undominated Nash equilibrium (UNE) can be computed 
by solving two LPs instead of one and an e-quasi-perfect- 
equilibrium by solving a single LP (though the second one 
is not technically a refinement and has documented numer- 
ical stability issues). We have implemented algorithms for 
computing both of these on large NLTH endgames, which 
demonstrates for the first time that they are feasible to com- 
pute in imperfect-information games of this magnitude. Pre- 
liminary experiments indicate that in NLTH endgames UNE 
is useful, though those results were not statistically signifi- 
cant, so we do not report on those experiments here. 


3.3 Finer-Grained, History-Aware, and 
Strategy-Biased Abstraction 


Another important benefit of endgame solving in large 
games is that we can compute better abstractions in the 
endgame that is actually played than if we are forced to ab- 
stract the entire game at once in advance. In addition to al- 
lowing us to compute finer-grained abstractions, endgame 
solving enables us to compute an abstraction specifically for 
the situation at hand. In other words, we can condition the 
abstraction on the path of play so far (both the players’ ac- 
tions and nature’s actions). For example, in poker, we can 
condition the abstraction on the betting history (which of- 
fline game-solving approaches do not do) and on the board 
cards (which offline game-solving approaches cannot afford 
to do at an equally fine granularity). 

The standard approach for performing information ab- 
straction is to bucket information sets together for hands 
that perform similarly against a uniform distribution of the 
opponent’s private information (Gilpin and Sandholm 2006; 
Johanson et al. 2013). However, the assumption that the op- 
ponent has a hand uniformly at random is extremely unre- 
alistic in many situations; for example, if the opponent has 
called large bets throughout the hand, he is unlikely to hold 
a very weak hand. Ideally, a successful information abstrac- 
tion algorithm would group hands together that perform sim- 
ilarly against the relevant distribution of hands the opponent 
actually has—not a naive uniform random distribution. For- 
tunately, we can accomplish such strategy-biased informa- 
tion abstraction in endgames. Our algorithm is detailed in 
Section 4. 


3.4 A Solution to the Off-Tree Problem 


When we perform action abstraction, the opponent may take 
an action that falls outside of our action model for him. 
When this happens, an action translation mapping (aka re- 
verse mapping) is necessary to interpret his action by map- 
ping it to an action in our model (Ganzfried and Sandholm 
2013; Schnizlein, Bowling, and Szafron 2009). However, 
this mapping may ignore relevant game state information. In 
poker, action translation works by mapping a bet of the op- 
ponent to a ‘nearby’ bet size in our abstraction; however, it 
does not account for the size of the pot or remaining stacks. 
For example, suppose remaining stacks are 17,500, the pot 
is 5,000, and our abstraction allows for bets of size 5,000 
and 17,500. Suppose the opponent bets 10,000, which we 


map to 5,000 (if we use a randomized mapping, we will do 
this with some probability). So we map his action to 5,000, 
and simply play as if he had bet 5,000. If we call his bet, we 
will think the pot has 15,000 and stacks are 12,500. How- 
ever, in reality the pot has 25,000 and stacks are 7,500. These 
two situations are completely different and should be played 
very differently (for example, we should be more reluctant 
to bluff in the latter case because the opponent will be get- 
ting much better odds to call). This is known as the off-tree 
problem. Even if one is using a very sophisticated translation 
algorithm, one will run into the off-tree problem. 

When performing endgame solving in real time, we can 
solve the off-tree problem completely. Regardless of the ac- 
tion translation used to interpret the opponent’s actions prior 
to the endgame, we can take the stack and pot sizes (or 
any other relevant game state information) as inputs to the 
endgame solver. Our endgame solver in poker takes the cur- 
rent pot size, stack sizes, and prior distributions of the cards 
of both players as inputs. Therefore, even if we mapped the 
opponent’s action to 5,000 in the above example, we cor- 
rect the pot size to 25,000 (and the stack sizes accordingly) 
before solving the endgame. 


4 Endgame Solving Algorithm 

In this section we present our algorithm for endgame solv- 
ing in imperfect-information games with very large state and 
action spaces. Pseudocode is given in Algorithm 1. The core 
algorithm is domain independent, although we present the 
signals as card-playing hands for concreteness. An example 
poker hand illustrating each step of the algorithm is given in 
Appendix B. 


Algorithm 1 Algorithm for endgame solving 


Inputs: number of information buckets per agent k;; ab- 
straction parameter T; action abstractions B; with b; action 
sequences; clustering algorithms C;; equilibrium-finding al- 
gorithm Q; number of private hands H; hand rankings RJ] 
Compute joint hand-strength distribution D[¢][j] 
E, E> < array of dimension H of zeroes 
for hı = 1 to H do 
Tie Rihi| 
S1; S2 — 0 
for ho = 1 to H do 
T2 <— Rho] 
8, t= D{hy|[ho], S2 += D{ho|[h1] 
if rə < rı then 
Ey [hy] += D[hi] [h2], E2[h1] += D[ha][h1] 
else if r4 == r then 
E, [hi] += Pihal{ha) E[hı] += Diha}{ha} 
Ey [hy] = A, Fafi] = Fl 


ki + |E] fori = 1,2 
A; < information abstraction created by clustering ele- 
ments of F; into k; buckets using C; for i = 1, 2 

Solve game with information abstractions A; and action 


abstractions B; using Q 


The first step is to compute the joint input distribution of 


private information using Bayes’ rule. The naive approach 
for doing this would require iterating over all possible pri- 
vate hand combinations A1, ho for the players, and for each 
pair looking up the probability that the base agent would 
have taken the given action sequence. This requires O(n?) 
lookups to the strategy table, where n is the number of pos- 
sible hands (n = 1081 for the final round in poker). It 
turns out that this computation would become the bottle- 
neck of the entire endgame-solving algorithm and would 
make real-time endgame solving computationally infeasible. 
For this reason, prior approaches for endgame solving have 
made the (significantly) simplifying assumption that the 
distributions are independent (Gilpin and Sandholm 2006; 
2007). However, we developed an algorithm that does this 
with just O(n) table lookups. Pseudocode for our algorithm 
is given in Algorithm 2. 


Algorithm 2 Algorithm for computing hand distributions 


Inputs: Public board B; number of possible private hands 
H, betting history of current hand A; array of index conflicts 
IC[][]; base strategy s* 
D,, Dz + array of dimension H of zeroes 
for pı = 0 to 50, pı not already on B do 
for pə = pı + 1 to 51, pə not already on B do 
I + IndexFull(B, p1, p2) 
IndexMap[J] < IndexHoles(p1, p2) 
P, « probability P1 would play according to h 
with pı, p2 in s* 
P < probability P2 would play according to h 
with p1, p2 in s* 
D,[I] += Pi, Do[I] += P2 
Normalize D, and Dy so all entries sum to 1 
for 1 = 0 to H do 
for j = 0 to H do 
if !1C[IndexMap[i]][IndexMap|j]] then 
Diil] — Dali] - Dalj] 
else 
Diijj] = 0 
Normalize D so all entries sum to 1 return D 


In short, the algorithm first computes the distributions 
separately for each player (as done by the independent ap- 
proach), then multiplies the probabilities together for hands 
that do not share a common card (and setting the joint proba- 
bility to zero otherwise). In order to make sure hands are in- 
dexed properly in the array, we must make use of two helper 
indexing functions, Algorithms 3 and 4. The former gives an 
algorithm for indexing the two-card private hands, and the 
latter gives an algorithm for indexing the 7-card river hand 
consisting of the two private cards and five public cards. 
Then, in Algorithm 2, we iterate over all sets of private hands 
(pı, p2), and create an array called IndexMap that maps the 
7-card hand index to the corresponding 2-card hand index. 
In the course of this loop, we also look up the probability 
that each player would play according to the observed bet- 
ting history in the precomputed trunk strategies, which we 
then normalize in accordance with Bayes’ rule. 

In advance of applying Algorithm 2, we compute a table 


Algorithm 3 Algorithm for computing private hand index 
Inputs: Private hole cards h1, h2 
if hə < hı then 
tc hy 
hy 4 hə 
ho +t 
return (2) + ("*) 


Algorithm 4 Algorithm for computing index of 7-card 
hands on a given board 


Inputs: Private hole cards h1, h2, board B consisting of five 
public cards 
if hə < hı then 
tc hy 
hy - ho 
ho -t 
nı — 0, Nn <— 0 
for i = 1 to 5 do 
for 7 = 1 to 2 do 
if Biil < hj then +41; 


return C2.) + ere 


of the conflicts between each pair of private-hand indices, 
where we set IC[z][j] to 1 if hand with indices 2 and j share 
a card in common, and 0 otherwise. Then, we set the joint 
probability D[i][j] to equal the product of the two indepen- 
dent probabilities Dı |i], D2[j] if there is no constraint be- 
tween the indices, and we set it to zero otherwise. Note that 
this algorithm actually runs in O(n”), where n is the num- 
ber of private hands. However, the n? loop only involves 
the simple step of looking up an element in the IC array, 
which is perfomed extremely quickly. The time-consuming 
part of the computation is looking up the strategy probabil- 
ities P,, P2, which involves accessing several elements in 
the massive binary strategy file. Our algorithm peforms this 
task only O(n) times, while the naive approach would do 
this O(n”) time, and make real-time endgame solving in- 
tractable. (Note that each private hand consists of the two 
cards p1, p2, so while the main loop in Algorithm 2 iterates 
over both pı and pg, it is only iterating once over the H pri- 
vate hands and is O(n)). 

Next we compute arrays £, Ea that contain the equities 
for each state of private information against the opponent’s 
distribution. For player 1, we do this by adding D[h,][ha] 
to Eı[hı] for each hand hg such that the rank of it on the 


D{[hi][h2] 
2 


given board is lower than that of h1, and adding for 


each hand with equal rank. We then normalize the entries 
of Ei [hı], and compute E> analogously. Æ [h1] is now the 
probability that player 2 has a hand worse than hy, given the 
prior distribution D and the current history of betting and 
public cards. 

In advance of gameplay, we have computed separate ac- 


“The rank of a hand R[h;] given a set of public board cards B 
is an integral-valued mapping such that stronger hands on B have 
a higher value; for example, a royal flush has the highest rank. 


tion abstractions for the endgame solver to use for each 
pot/stack size that could be encountered. This allows us to 
solve the “off-tree problem,” since we are taking into ac- 
count the actual pot size even the opponent took an action 
outside the action abstraction earlier in the hand. We have 
constructed these abstractions so that the larger pot sizes 
(which have shallower stacks) have more bet sizes available 
for each history, for several reasons; the first is that the tree 
is smaller in these situations due to the shallower stack sizes 
(once players are “all-in,” no additional bets are allowed), 
and the second is that hands with larger pot sizes are more 
important, since more money is won and lost on them, and 
we would like to ensure that more bet sizes are accounted 
for on these hands. B; denotes the action abstraction to use 
for the given pot size at hand, and b; denotes the number of 
betting sequences of B;, for i = 1, 2. 

Next, we compute a card abstraction A; by grouping F; 
into k; buckets, using some clustering algorithm C;, for 
i = 1,2. Here k; = ae where T is a parameter of the algo- 
rithm (for our agent we used T = 7500). While much prior 
work on poker has used k-means as the standard cluster- 
ing algorithm, the following example demonstrates why this 
would be problematic. Suppose there are many hands with 
an equity of 0.7643, and also many hands with an equity 
of 0.7641. Then k-means would likely create separate clus- 
ters for these two equity values, and possibly group hands 
with very different equities (e.g., 0.2 and 0.3) together if few 
hands have those equities. To address this concern we used 
percentile hand strength, which also happens to be easier 
to compute. To do this, we break up the interval [0,1] into 
ki regions of equal length (each of size ae We then group 


hand h; into bucket | Bed). (For our poker agent we ac- 
tually use a slight modification of this approach where we 
create a special bucket just for the hands with E;[h;] > a, to 
ensure that the strongest hands are grouped separately (we 
used a = 0.99 for our agent). Then the remaining a mass 
is divided according to the previously described procedure.) 
Sometimes this algorithm results in significantly fewer than 
k; buckets, since there may be zero hands with Æ; within 
certain intervals. We take this into account, and reduce the 
number of buckets in the card abstraction accordingly be- 
fore solving the endgame. Note that the card abstractions A; 
may be very different for the two players (and have different 
numbers of buckets). 

Finally, we compute an (exact) equilibrium in the ab- 
stracted endgame by applying an equilibrium-finding algo- 
rithm Q to the game with card abstractions A; and betting 
abstractions B;. While the card abstractions were computed 
independently (based on equities derived from the joint dis- 
tribution), we use the joint distribution for determining the 
probabilities that players are dealt hands from their respec- 
tive buckets when constructing the endgame. For our agent, 
we used Gurobi’s parallel LP solver (Gurobi Optimization 
2014) as Q. 


5 Experiments on No-Limit Texas Hold’em 


We tested our algorithm against the two strongest agents 
from the 2013 poker competition. The base agent was a ver- 


Algorithm 5 Algorithm for computing endgame informa- 
tion abstractions 

Inputs: Equity arrays /;; desired number of buckets per 
agent ki, parameter for top bucket a; total number of pos- 
sible private hands H 


Jepa 
A, + array of zeroes of size H 
Uı + array of booleans initialized to false of size H 
for h = 1 to H do 
if Ei [h] > a then 
be ky — 1 
else ith) 
be S] 
if U1 [b] == FALSE then 
U [h] < TRUE 
M, + array of zeroes of size kı 
g<O0 
for i = 0 to k; do 
My |i] =g 
if U1 [i] == TRUE then 
g=g+1 
for h = 1 to H do 
if Fi [h] > a then 
Alh] i= Mi [ki = 1] 
else 
Ail] = m || 244) 


Compute A> analogously 


sion the agent we submitted to the 2014 AAAI computer 
poker competition (that came in first place) from shortly be- 
fore the competition. Ordinarily it would be very time con- 
suming to differentiate the performance of the base strate- 
gies from the endgame solver with statistical significance, 
since the endgame solver plays relatively slowly (it averaged 
around 8 seconds per hand, which still kept us well within 
the competition time limit of 7 seconds per hand on aver- 
age, since only around 25% of hands make it to the final bet- 
ting round). A useful variance-reduction technique is to only 
consider hands where both agents make it to an endgame. In 
Appendix C we prove that this technique is unbiased. The re- 
sults using this evaluation metric are given in Table 1, where 
the + indicates 95% confidence intervals. 


Ol 02 
+87 + 50 +29 + 25 


Table 1: Improvement by using endgame solving against the 
strongest agents from the 2013 poker competition over all 
hands where both agents made it to some endgame (i.e., to 
the river betting round). Units are milli big blinds per hand. 


The base agent used a procedure called purification on all 
rounds (except for the first preflop action); this procedure 
selects the maximal probability action at each information 
set with probability | instead of randomizing according to 
the abstract equilibrium strategy (ties are broken uniformly 


at random) (Ganzfried, Sandholm, and Waugh 2012). This 
parameter setting was shown to be the best in our thorough 
experiments in prior years, and we had used this as the stan- 
dard setting when evaluating our base agent. The main mo- 
tivation for purification is that it compensates for the fail- 
ure of iterative equilibrium-finding algorithms to fully con- 
verge to equilibrium in the abstract game (a phenomenon 
that has been documented by prior agents, e.g., (Ganzfried 
and Sandholm 2012)). The endgame solving agent did not 
use any rounding for the river, as the endgame equilibria 
are exact (within the chosen abstraction), and the problem 
of the equilibrium-finding algorithm failing to converge is 
not present. Both agents used the pseudoHarmonic action 
translation mapping (Ganzfried and Sandholm 2013) for all 
rounds to interpret actions taken by the opponent that fall 
outside of the action abstraction. 

The results are from 100 duplicate matches against O1 
and 155 duplicate matches against O2. Since each match is 
3000 hands, this means we played 600,000 hands against 
O1 and 930,000 hands againt O2. Out of these hands, both 
versions of our agent made it to the river round on 173,568 
hands against O1 and on 318,700 hands against O2. If we 
had used the standard duplicate approach for evaluating per- 
formance, we would not have been able to statistically dif- 
ferentiate the base agent from the endgame solver over this 
sample. However, we were able to obtain statistically signif- 
icant results using our new evaluation approach. 


6 Conclusions and Future Research 


We demonstrated that endgame solving can be successful 
in practice in large imperfect-information games despite the 
fact that the strategies it computes is not guaranteed to con- 
stitute an equilibrium in the full game (which we showed). 
We also showed that endgame solving guarantees a low ex- 
ploitability in certain games, and presented a framework 
that can be used to evaluate its applicability more broadly. 
We described several benefits of endgame solving in large 
imperfect-information games, including exact computation 
of Nash equilibria in abstracted endgames, the ability to 
compute certain equilibrium refinements, the ability to com- 
pute finer-grained, history-aware, and strategy-biased ab- 
stractions in endgames, and a solution to the off-tree prob- 
lem. We presented an efficient algorithm for performing 
endgame solving in very large imperfect-information games, 
and showed that our algorithm led to a significantly stronger 
performance against the strongest agents from the 2013 
computer poker competition. 

This work opens many interesting avenues for future re- 
search. We showed that endgame solving can produce strate- 
gies with high exploitability in certain games, while it guar- 
antees low exploitability in others. It would be interesting to 
study where different game classes fall on this spectrum. It is 
possible that for interesting classes of games—perhaps even 
classes that include variants of poker—endgame solving is 
guaranteed to produce strategies with low exploitability. It 
would also be interesting to study various subdivisions of a 
game into a trunk and endgames and to experiment on addi- 
tional game classes. 
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A No-Limit Texas Hold’em Poker 


No-limit Texas Hold’em is the most popular variant of poker 
among humans, and the two-player version is the game of 
most active research in the computer poker community cur- 
rently. This game works as follows. Initially two players 
each have a stack of chips (worth $20,000 in the computer 
poker competition). One player, called the small blind, ini- 
tially puts $50 worth of chips in the middle, while the other 
player, called the big blind, puts $100 worth of chips in the 
middle. The chips in the middle are known as the pot, and 
will go to the winner of the hand. 

Next, there is an initial round of betting. The player whose 
turn it is can choose from three available options: 


e Fold: Give up on the hand, surrendering the pot to the 
opponent. 


e Call: Put in the minimum number of chips needed to 
match the number of chips put into the pot by the oppo- 
nent. For example, if the opponent has put in $1000 and 
we have put in $400, a call would require putting in $600 
more. A call of zero chips is also known as a check. 


e Bet: Put in additional chips beyond what is needed to call. 
A bet can be of any size up to the number of chips a player 
has left in his stack. If the opponent has just bet, then our 
additional bet is also called a raise. 


The initial round of betting ends if a player has folded, 
if there has been a bet and a call, or if both players have 
checked. If the round ends without a player folding, then 
three public cards are revealed face-up on the table (called 


the flop) and a second round of betting occurs. Then one 
more public card is dealt (called the turn) and a third round 
of betting, followed by a fifth public card (called the river) 
and a final round of betting. If a player ever folds, the other 
player wins all the chips in the pot. If the final betting round 
is completed without a player folding, then both players re- 
veal their private cards, and the player with the best hand 
wins the pot (it is divided equally if there is a tie). 

In the experiments, we will be solving endgames after the 
final public card is dealt but before the final round of betting. 
(Thus, the endgame contains no more chance events, and 
only publicly observable actions of both players remain.) 


B Example Demonstrating Our 
Endgame-Solving Algorithm on No-Limit 
Texas Hold’em 


In this section we demonstrate the operation of our algo- 
rithm on an example hand of no-limit Texas Hold’em. Recall 
that blinds are $50 and $100 and that both players start with 
$20,000. In the example hand, we are in the small blind with 
8dTh. We raise to $250, the opponent re-raises to $750, and 
we call (there is now $1500 in the pot). The flop is Jc6s2c. 
The opponent checks and we check. The turn is Kd. The 
opponent checks, we bet $375, and he calls (there is now 
$2250 in the pot). The river is Qc. Up until this point we 
have just played according to the precomputed base strat- 
egy; the endgame-solving algorithm begins now. 

According to the pseudocode for Algorithm 1, the first 
step is to compute the joint prior hand distribution D from 
the base strategies, using Algorithm 2. This took 0.433 sec- 
onds. We then compute the equities Æ; for each player, using 
Algorithm 1. This took 0.015 seconds. 

The next step is to look at the betting abstraction that has 
been precomputed for this specific pot/stack size (pot size of 
$2250 and stack sizes of $18875). Note that for this partic- 
ular hand all of the opponent’s actions before the river fell 
inside of our betting abstraction; however, if they had not, 
and we were forced to use an action translation mapping to 
map his action to an action in our betting abstraction, we 
would be able to correct our misperception of the pot size at 
this point, by selecting the precomputed betting abstraction 
for the actual pot/stack size (as opposed to the size that as- 
sumed he played an action in our betting abstraction). This 
solves the “off-tree” problem, discussed in the paper. 

The betting abstraction for a pot size of $2250 has 196 
betting sequences for each player. For this hand we used a 
betting abstraction parameter of T = 10000 (while for the 
experiments described in the paper, we used T = 7500). 
Therefore, we will use k; = | “952° | = 51 card buckets for 
each player for this hand. 

Next, we compute card abstractions for both players We 
used used a top bucket parameter of œ = 0.995 (while for 
the experiments described in the paper, we used a = 0.99). 
After applying our card abstraction algorithm for both play- 
ers, the resulting abstractions had 38 and 35 buckets respec- 
tively for the two players (since not all of the 51 hand equity 
intervals contained hands). Computing these took 0.008 sec- 
onds. 


Our actual hand (8dTh) had rank 296 (out of 1081) and 
actually had an equity of 0 vs. the opponent’s hand distribu- 
tion (we thought the opponent would never play the hand the 
way he did so far with a worse hand than 8dTh). This places 
us in bucket 0 (the worst bucket, out of 35). By contrast, 
if the opponent had our hand, he would have an equity of 
0.336 against our hand distribution, and would be in bucket 
8 (where his buckets range from 0-37). 

We then construct the LP matrices for the resulting ab- 
stracted endgame, which took 0.15 seconds, and then com- 
pute an exact equilibrium by solving the LP using Gurobi’s 
parallel LP solver (it took 1.051 seconds to construct the 
LP instance and 5.328 seconds to solve it). Overall, the 
endgame-solving algorithm took 6.985 seconds for this 
hand. 

The opponent checked for his initial action on the river. 
The betting abstraction for this hand had nine available op- 
tions for the first action for each player: check, 0.1 pot, 3 pot, 
2 pot, pot, 1.5 pot, 2 pot, 3 pot, all-in. The strategy from our 
endgame solver said for us to check with probability 0.742, 
bet 2 pot with probability 0.140, bet pot with probability 
0.103, and bet 2 pot with probability 0.014. 


C Variance-Reduction Technique 


When comparing the performance of one version of an agent 
A, to another version that is identical except that it plays 
differently on endgames A», one would like to take advan- 
tage of the fact that the agents play identically up until the 
endgames in order to evaluate the performance difference 
more efficiently. Ideally, we could play A; against a given 
opponent, and when the endgame is reached, evaluate how 
both A; and Az would do on that same endgame given the 
trunk history. However, such a technique is not possible on 
the poker competition test server. All that is allowed is to 
play A, and Ag against an opponent for a full set of matches. 
The agents may reach endgames on different hands, or may 
reach different endgames even on the same hands (since 
both our agent and the opponent may be playing random- 
ized strategies before the endgames). 

One possible approach for reducing variance would be 
to only consider hands where both A, and Ag arrive at 
the same endgame (the same betting history was played). 
It turns out that this approach is actually biased, so it cannot 
be applied to accurately measure performance. A second ap- 
proach, that it turns out is unbiased, would be to only con- 
sider the hands where both agents arrive at some endgame 
(though not necessarily the same one). If we only consider 
these hands, then the difference in performance between the 
two agents is an unbiased estimator of their true performance 
difference. This would allow us to achieve statistical signif- 
icance using a smaller sample of hands. 


Proposition 3. Let A; and Az be two algorithms that differ 
in play only for endgames. Then the difference in perfor- 
mance looking at only the hands where both make it to the 
same endgame is not an unbiased estimator of the overall 
performance difference. 


Proof. Suppose there were only two betting sequences and 
both make it to the river, where the first one (A) happens 


99% of the time and the second one (B) happens 1% of the 
time. Then the probability that both hands hit the river with 
B on any particular hand is 0.01%, and the probability that 
both hands hit the river with A with any particular hand is 
98.01%. So if you look at all hands where both hit the river 
with the same sequence, there would be only 1 (B) for every 
9802 (A) sequences. 


Proposition 4. Let A; and Ag be two algorithms that dif- 
fer in play only for endgames. Then the difference in per- 
formance looking at only the hands where both make it to 
some (but not necessarily the same) endgame is an unbiased 
estimator of the overall performance difference. 


Proof. For each history that leads into an endgame h;, let 
pi be the probability that h; is played when we use the base 
strategy against the opponent O. Then the expectation of the 
difference in payoff between playing A, (base strategy) and 
Az (endgame solver) against O is 


do [pi (U(A1, O, hi) — U(Az, O, hi))] 


i 


= J [p:U(41,0, hi)] - > [p:U (A2, O, hi)] 


i 


Suppose that we look at performance over all hands where 
both algorithms make it to some endgame. The probability 
that A, makes it to the endgame with history h; and A» 
makes it to the endgame with history h; is pipj. Thus, the 
expectation of the payoff difference is 


D DS [pip; (U(A1, O, hi) — U(A2, O, hj ))] 


= De 2 [pipjU(A1, O, hi)] — > pm [pipjU(Az2, O, h;)] 


= > poao g] =>, [ovca osn Zp 


i Jj 


= 2 [PiU (Ai, O, hi)] — X. [p;U (A2, O, hy) 


J 


= D [piU(A1, O, hi)] — $. [p:U (42, O, h:)] 


i 


